In this Session of our R Workshop, we will cover the following introductory aspects of R Studio & Programming:
By the end of this Session, you will understand:
There are four main window panes in RStudio: the Source, the Console,
the Environments and the Output Window.
We’ll talk about scripts and Notebooks soon.
The Environments pane has four tabs available by default:
The output pane has a variety of tabs that provide access to different types of output. This includes files, plots, packages, help and a couple of other specialized output types. Tabs in this pane include:
# Lets create a little plot
# create some data
data <- data.frame(a = c(72,41,54,36), b=c('East','West','North','South'))
# generate plot
barplot(data$a,
names.arg = data$b,
col="blue",
ylab="# of Regional Managers")
Through RStudio, you can use and render R code in a variety of ways including:
Even though you will probably use scripts when you start out coding, for this workshop we are going to work in a notebook to take advantage of the ability to have readable text along side runnable code.
Variables are named containers used to store information. What can we name them? Almost anything we want, according to the following rules:
All programming languages have reserved words. These are words that have a specific meaning to the language itself. Some examples in R include:
You can find a list of reserved words here.
For a variable to be useful, you must give it a value. To set a variable to a value, you use the assignment operator. You’ve seen this in math, for instance, \(a+b=5\). In R, to assign a value to a variable we use “<-”. We’ll talk more about operators later, but now you know the assignment operator!
The code box below shows some example variables. Use CMD/CTRL + RETURN/ENTER to run a line of code.
# Set some variables
# See the variable value
In most programming languages, you use “=” as the assignment
operator, like you see with i.am.happy above. While R lets
you use “=”, the convention is to use “<-”. This is a holdover from
the language from which R is derived, S. Programmers are funny, funny
people.
Using the rules listed above which of the following passwords will work?
# Not sure which will run?
# Use the assignment operator to assign the value 5 to a variable using the names above. Feel free to test some you define yourself!
My_favoritePASSWORDofAlL.tymes <- 5
Data Types
We’ll talk about 4 main data types:
Numeric values can be described as integers, decimals (floating point numbers) and complex. We will focus on integers and decimals, but be aware that R can handle complex values too.
Some languages aggressively distinguish between integers (whole
numbers) and floating point (decimal) values. R does not. In R, when you
store a number it is loaded as a double. This is just
another word for decimal or floating point.
In most situations, you don’t have to worry about whether a number is an integer or a floating point value. However, it does recognize integers and doubles (floating point). It also has functions that check or convert a number to integers and floats. Read more about numbers in R here.
Let’s look at some examples.
# Let's set some variables to an integer.
# Uncomment the line below.
# Look for a red x to the left of the line number
# Try to run the line of code.
# NoCommasReally <- -22,550,675
# Remember to re-comment the line because the error may keep the notebook from rendering.
REMEMBER: computers don’t need commas in numbers. That’s just to make them easier to read for us humans.
# Now we'll define some floating point variables
# stillNoCommas <- 4,325.98
Use the
typeof()function to check what data type R thinks a variable is.
We’ve looked at the scary, scary numbers. Let’s look at good old familiar text. Run the code in the block below. What data type does R see string data as? Hint: run the last line!!
# Some text examples
Compare how ughWhatNow is printed to how
ughWhatNowPart2 is printed. What is the difference? Where
did the two “\” come from?
First, the “\” is called an escape character. You use it to differentiate between a character that should be used as a character as opposed to indicating the beginning or end of a string. You might also use it to indicate a non-printing character. An example would be “\n” which is a line feed. In this case, we want to have a string that includes quoted text, but we use quotes to tell R something is a string. The example above shows that R uses double quotes to identify the beginning and end of a string and if we need to include quoted text, we should use single quotes.
Dates are their own special kind of special. The Date/Time data type is useful for:
We can define a date as a string. The computer doesn’t care:
Why would we want to use a date data type? See above or try the code
below. Uncomment the line that states with letsAdd4Days and
try to run it.
# To explore when data strings break, uncomment the next line and try to run it. We are just trying to find out what the date is 4 days from our date string:
# letsAdd4Days <- aDateButItsReallyAString + 4
# !! Recomment it when you are done. Otherwise you notebook won't render.
Using text dates has it’s limitations. What if you didn’t want to print it out as “07/15/2025”, but wanted to show “July 15, 2025”?
That’s where the date data type comes in.
# Let's try some dates
What is with the output on the third line? Good question! Look at the
format string passed in the as.Date() function. Compare the
formats used to define myDateDate and
myDateDate2. What’s different?
The question is can we add 4 days to myDateDate now?
Yes!
A Boolean value is one that can take one of two values: 0 or 1. In R, these types of values are represented using the “logical” data type. The logical data type takes the value TRUE (1) or FALSE (0). It is good for setting flags and creating indices to subset data.
# Example of boolean
When we get to the section on operators, you will begin to see the extent to which we use logical values in programming. It make sense when you remember that computers only understand voltage off/voltage on which gets encoded as 0/1. So, of course we would use logical values a lot. It is the most basic thing computers can manipulate.
# Create a variable and set it to some text.
# Create a variable and set it to a numeric value
# Create a variable and set to a date type
# Create a text variable and set it to a text date.
# Create a date variable and convert the text date to a date using `as.Date()`
# Print your date variable to the screen
So, we’ve looked at the basic data types in R. In this section, we’ll explore ways values can be stored in R. When we discuss how data is stored in a programming language we refer to data structures.
We will explore three widely used data structures:
Basic distinguishing factors between data structions include:
There are two aspects of using data structures:
We’ll cover both.
Vectors store data in one dimension and allow only one data type to be stored.
In point of fact, R is a vectorized language. What does that mean? It means scalars (single values) in R are really 1-element vectors. Will this have a big impact on learning R? No, probably not. But you can whip this fact out at parties and amaze your friends.
Creating vectors is shown in the code below. We’ll make one for integers, floating point values, text and Boolean. No dates.
# Let's clean up the variables to make it easier to see what's going on
rm(list=ls())
# EXAMPLES OF HOW TO CREATE VECTORS
# (Single Data Type)
# Integer. Ok, fine, double.
# Decimal. Ok, fine, double.
# Text
# Boolean/Logical
# Into the weeds we go ....
# Also a vector:
I said arrays will hold only one data type and then I went and loaded one up with all sorts of data types. This was to prove a point. Just because you CAN do a thing doesn’t mean you should. As you can see R helpfully converts every value to a string. This is great if this is what you wanted and expected to happen. However, if not, it could break your code.
Elements in a vector can be accessed by using an index, or range of indices between two square brackets. Let’s try it out.
# Make sure you ran the code above to define the vectors. Click on the green run button in the code chunk above.
# VECTORS
# Let's Get Fancy!
Matrices:
can have 2 or more dimensions
hold only one data type
They are useful in computational work. There are a couple of ways to declare a matrix in R.
If you only need a 2-dimensional matrix (rows and columns only), you
can use matrix(). Let’s see an example below:
# Create a matrix
# Create a matrix filled with zeros
# that you can fill it in later.
# Ooh let's set an element to some other value...
To create a matrix that has more than 2 dimensions, you use the
array() function. Let’s give it a try.
# Set a value
# Set the value of the 5th column, 3rd row and 2nd dimension of myArray to your favorite number. Print out "my_array" to see the change
Dataframes:
dataframes can have row and column names.
# Create a dataframe
# extract data from dataframe
# Create some data to use for the problems below. Make sure you run this line of code
data <- c(1:24)
# Create an array/vector that holds 4 words
# Create a two-dimensional maxtrix and fill it with `data`
# Try using dimensions that fit the data and dimensions that don't
# nrow and ncol MUST be set so that there will be the same number of elements in the matrix as in the data you pass to matrix()
# Create a 3-dimensional matrix and fill it with `data`
# Try using dimensions that fit the data and dimensions that don't
# Make a 3-dimensional matrix and fill it with zeros
# pick whatever dimensions you want, but don't break your computer with huge values!
# Make a dataframe with two rows and three columns of data
Operators are used to perform … well, … operations. They provide a way to compare values and variables. Operators may sound exotic, but you’ve encountered them in any basic math class you’ve ever taken. We’ve covered the assignment operator…quick what is the assignment operator?
We’ll cover the following kinds of operators:
Arithmetic,
Relational, and
Logical.
I’m sure you can already guess what these are:
| Operator | Unary / Binary | Operation | Example |
|---|---|---|---|
| - | U | negation | -5 |
| + | B | addition | amt1 + amt2 |
| - | B | subtraction | amt1 - amt2 |
| * | B | multiplication | 2*amt1 |
| / | B | division | amt1/2 |
| ^ or ** | B | power | 2^3 |
| %% | B | modulus | 10 %% 3 |
| %/% | B | integer division | 10 %/% 3 |
Operators can be referred to as unary or
binary. From math class, you may recall that operators
act upon operands. Unary means that an operator takes a
single operand while binary indicates it takes two. The -
operator is a great example. It is not essential you know this, but if
you hear it in the future you’ll know what it means.
# Let's clear out our variables
rm(list=ls())
# Unary
# Binary
Most of the arithmetic operators should be familiar. Examples are provided below:
# Use familiar operators with values
# Separate Output
print('* * * * * * * * * * *')
## [1] "* * * * * * * * * * *"
# Use familiar operators with variables
# We'll be creative and use new values
# Declare some variables and set them to values
# Apply operators
However, some operators may be new. These include the modulus
operator, %%, and the special operator for integer
division, %/%. These two operators provide special
functionality. The modulus operator returns the remainder of integer
division. The operator to perform integer division, returns the quotient
of the division.
myNumber <- 10
myFirstDivisor <- 2 # divides evenly
myOtherDivisor <- 3 # doesn't divide evenly
# Integer Division & Modulus using first divisor
# Separate Output
print("* * * * * * * * * *")
## [1] "* * * * * * * * * *"
# Integer Division & Modulus using second divisor
Relational operators are all binary and are used to compare two
values. The result of using relational operators is a logical (Boolean)
value: TRUE or FALSE.
| Operator | Operation | Example |
|---|---|---|
| < | less than | mynum < 3 |
| <= | less than or equal to | |
| > | greater than | mynum > 3 |
| >= | greater than or equal to | mynum >= 3 |
| == | equality | “a”==“b” |
| != | inequality | “a”!=“b” |
# Let's clear out our variables
rm(list=ls())
# Define some variables and set them to a value
bigWolf <- 135 # this is the weight of a wolf in pounds
medWolf1 <- 85 # this is an average sized wolf
medWolf2 <- 85.000000000001 # this is another average sized wolf
smallWolf <- 50 # small wolf
myGermanShepard <- 85 # weight of my dog, if I had one
# Let's compare my shepard to the different wolves
# Let's look at `==` more closely
It is a bad idea to compare equality of floating point numbers. This is due to the fact that computation is done in binary and can lead to floating point errors. It is standard practice to use a tolerance to check for equality. See the example below:
a <- 0.4 + 0.5
b <- 0.9
epsilon <- 1e-9 # a very small number
(a-b) < epsilon # if the difference is less, then you assume equality.
## [1] TRUE
print(a-b)
## [1] 0
Logical operators are used to compare two logical values or two relational expressions that evaluate to logical values.
| Operator | Operation | Example |
|---|---|---|
| & | Element-wise AND | c(TRUE, TRUE, FALSE) & c(FALSE, TRUE, FALSE) |
| && | Logical AND | isCute & isBig |
| | | Element-wise OR | c(TRUE, TRUE, FALSE) | c(FALSE, TRUE, FALSE) |
| || | Logical OR | isCute |
| ! | Logical NOT | !isCute |
# Use variables from last section
# Define some variables to compare
numWolves <- 50
numDogs <- 25
numCats <- 100
# Define some Boolean variables and set them
isParty <- FALSE
haveHotdogs <- TRUE
haveCheetos <- TRUE
# Check if numWolves is less than numDogs AND numCats is less than numDogs
# Check if numDogs is less than or equal to 25 OR numCats is less than or equal to 25
# Check if numDogs is less than 150 OR numCats is less than 150
# See isParty is TRUE AND if numDogs is greater than 0 AND haveHotdogs is TRUE
# See if (numDogs is greater than zero AND haveHotdogs is TRUE) OR (numCats is greater than 0 AND haveCheetos is TRUE)
# How could you fix the expression above to make sure that you have food available for cats and dogs who come to your party?
# PROBLEM 1
# You need to see if you have more buns than hotdogs. Create two variables, numBuns and numHotdogs and set them to any values you want. Create a variable "haveEnoughBuns" and write an expression that sets `haveEnoughBuns` to TRUE if you have enough. Try different values for `numBuns` and `numHotdogs`, if you want.
# Declare `numBuns` and `numHotDogs`
# Check if you have enough buns.
# PROBLEM 2
# We need to make sure we have potato chips for our guests for our Friday night party
# Declare the following variables:
#
# `perfectChipsPortion`: set it to the number of grams a perfect portion of potato chips would be
# `numGuests`: set it to the number of guests you've invited
# `chipsNeededInGrams`: calculate the number of chips you need in grams
# `numBagOChips`: set to the number of bag of chips you picked up
# `gramsPerBag`: set this to the number of grams of chips per bag
# `totalChipsInGrams`: calculate the number of grams you bought
# `haveEnoughChips`: compare the number of chips you need to the number you have (in grams)
# How many grams of chips do I need?
# How many grams of chips do I have?
# Do I have enough chips? Drum roll please...
# PROBLEM 3
# Use a logical operator to check whether you have enough buns for your hotdogs and enough chips for your guests. Declare a variable to store the result.
# PROBLEM 4
# Use the modulus operator and a relational operator to test if the variable `isThisEven` is an even number. Then, write a statement that creates a logical variable to hold the result.
# This is the number to check
# First use modulus to divide the number by 2, then check if the remainder is zero. This can be done it two steps or one
# Two Steps
# One step
In our next workshop, Control Flow and Functions, we’ll learn about and practice:
We appreciate your input. Let us know how this Workshop went for you! We take your input seriously and appreciate the time you take to give it. Please fill out our survey.
We are available to help you with whatever question you have about this Workshop or programming in general. If you run into snags or questions (we all do), as you start to create programs for yourself, reach out to us. Send an email, make an appointment, or reserve a workstation! We are here to assist you to be successful in your data science journey.